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In this paper, we consider resource allocation in the 3GPP Long Term Evolution (LTE) cellular uplink, which 
will be the most widely deployed next generation cellular uplink. The key features of the 3GPP LTE uplink (UL) 
are that it is based on a modified form of the orthogonal frequency division multiplexing based multiple acess 
O ■ (OFDMA) which enables channel dependent frequency selective scheduling, and that it allows for multi-user 

(MU) scheduling wherein multiple users can be assigned the same time-frequency resource. In addition to the 
, considerable spectral efficiency improvements that are possible by exploiting these two features, the LTE UL 

■ allows for transmit antenna selection together with the possibility to employ advanced receivers at the base- 
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CO I station, which promise further gains. However, several practical constraints that seek to maintain a low signaling 
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overhead, are also imposed. In this paper, we show that the resulting resource allocation problem is APX-hard 
and then propose a local ratio test (LRT) based constant- factor polynomial-time approximation algorithm. We 
then propose two enhancements to this algorithm as well as a sequential LRT based MU scheduling algorithm 



■ that offers a constant-factor approximation and is another useful choice in the complexity versus performance 

_ tradeoff. Further, user pre-selection, wherein a smaller pool of good users is pre-selected and a sophisticated 

scheduling algorithm is then employed on the selected pool, is also examined. We suggest several such user 
pre-selection algorithms, some of which are shown to offer constant-factor approximations to the pre-selection 
problem. Detailed evaluations reveal that the proposed algorithms and their enhancements offer significant gains. 



I. Introduction 

The next generation cellular systems, a.k.a. 4G cellular systems, will operate over wideband multi-path fading 
channels and have chosen OFDMA as their air-interface lUl. The motivating factors behind the choice of OFDMA 
are that it is an effective means to handle multi-path fading and that it allows for enhancing multi-user diversity 
gains via channel-dependent frequency-domain scheduling. The deployment of 4G cellular systems has begun 
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and will accelerate in the coming years. Predominantly the 4G cellular systems will be based on the 3GPP LTE 
standard [H since an overwhelming majority of cellular operators have committed to LTE and specifically all 
deployments in the forseeable future will adhere to the first version of the LTE standard, referred to as Release 
8. Our focus in this paper is on the uplink (UL) in these Release 8 LTE based cellular systems (henceforth 
referred to simply as LTE UL) and in particular on multi-user (MU) scheduling for the LTE UL. The LTE UL 
employs a modified form of OFDMA, referred to as the DFT-Spread-OFDMA HI. In each scheduling interval, 
the available system bandwidth is partitioned into multiple resource blocks (RBs), where each RB represents the 
minimum allocation unit and is a pre-defined set of consecutive subcarriers and OFDM symbols. The scheduler 
is a frequency domain packet scheduler, which in each scheduling interval assigns these RBs to the individual 
users. Anticipating a rapid growth in data traffic, the LTE UL has enabled MU scheduling along with transmit 
antenna selection. Unlike single-user (SU) scheduling, a key feature of MU scheduling is that an RB can be 
simultaneously assigned to more that one user in the same scheduling interval. MU scheduling is well supported 
by fundamental capacity and degrees of freedom based analysis lH, 131 and indeed, its promised gains need to 
be harvested in order to cater to the ever increasing traffic demands. However, several constraints have also been 
placed by the LTE standard on such MU scheduling (and the resulting MU transmissions). These constraints 
seek to balance the need to provide scheduling freedom with the need to ensure a low signaling overhead and 
respect device limitations. The design of an efficient and implementable MU scheduler for the LTE UL is thus 
an important problem. 

In Fig. [T] we highlight the key constraints in LTE MU scheduling by depicting a feasible allocation. Notice first 
that all RBs assigned to a user must form a chunk of contiguous RBs and each user can be assigned at-most one 
such chunk. This restriction allows us to exploit frequency domain channel variations via localized assignments 
(there is complete freedom in choosing the location and size of each such chunk) while respecting strict limits 
on the per-user transmit peak-to-average-power-ratio (PAPR). Note also that there should be a complete overlap 
among any two users that share an RB. In other words, if any two users are co-scheduled on an RB then those 
two users must be co-scheduled on all their assigned RBs. This constraint is a consequence of Zadoff-Chu (ZC) 
sequences (and their cyclic shifts) being used as pilot sequences in the LTE UL [1] and is needed to ensure 
reliable channel estimation. The LTE UL further assumes that each user can have multiple transmit antennas but 
is equipped with only one power amplifier due to cost constraints. Accordingly, it allows a basic precoding in 
the form of transmit antenna selection where each scheduled user can be informed about the transmit antenna 
it should employ in a scheduling interval. In addition, to minimize the signaling overhead, each scheduled user 
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can transmit with only one power level (or power spectral density (PSD)) on all its assigned RBs. This PSD is 
implicitly determined by the number of RBs assigned to that user, i.e., the user divides its total power equally 
among all its assigned RBs subject possibly to a spectral mask constraint. While this constraint significantly 
decreases the signaling overhead involved in conveying the scheduling decisions to the users, it does not result 
in any significant performance degradation. This is due to the fact that the multi-user diversity effect ensures 
that each user is scheduled on the set of RBs on which it has relatively good channels. A constant power 
allocation over such good channels results in a negligible loss ||4|. Finally, scheduling in LTE UL must respect 
control channel overhead constraints and interference limit constraints. The former constraints arise because the 
scheduling decisions are conveyed to the users on the downlink control channel, whose limited capacity in turn 
places a limit on the set of users that can be scheduled. The latter constraints are employed to mitigate intercell 
interference. In the sequel it is shown that both these types of constraints can be posed as column-sparse and 
generic knapsack (linear packing) constraints, respectively. 

The goal of this work is to design practical MU resource allocation algorithms for the LTE cellular uplink, 
where the term resource refers to RBs, modulation and coding schemes (MCS), power levels as well as choice of 
transmit antennas. In particular, we consider the design of resource allocation algorithms via weighted sum rate 
utility maximization, which accounts for finite user queues (buffers) and practical MCS. In addition, the designed 
algorithms comply with all the aforementioned practical constraints. Our main contributions are as follows: 

1) We show that while the complete overlap constraint along with the at-most one chunk per scheduled user 
constraint make the resource allocation problem APX-hard, they greatly facilitate the use of local ratio test 
(LRT) based methods O, ||6l. We then design an LRT based polynomial time deterministic constant-factor 
approximation algorithm. A remarkable feature of this LRT based algorithm is that it is an end-to-end 
solution which can accommodate all constraints. 

2) We then propose an enhancement that can significantly reduce the complexity of the LRT based MU 
scheduling algorithm while offering identical performance, as well as an enhancement that can yield good 
performance improvements with a very small additional complexity. 

3) We propose a sequential LRT based MU scheduhng algorithm that offers another useful choice in the 
complexity versus performance tradeoff. This algorithm also offers constant-factor approximation (albeit 
with a poorer constant) and a significantly reduced complexity. 

4) In a practical system, it is useful to first pre-select a smaller pool of good users and then employ a 
sophisticated scheduling algorithm on the selected pool. Pre-selection can substantially reduce complexity 



4 



and is also a simple way to enforce a constraint on the number of users that can be scheduled in a scheduling 
interval. We note that another way to enforce the latter constraint is via a knapsack constraint in the LRT 
based MU scheduling. We suggest several such user pre-selection algorithms, some of which are shown to 
offer constant-factor approximations to the pre-selection problem. 
5) The performance of the proposed LRT based MU scheduling algorithm together with its enhancements, 
the sequential LRT based MU scheduling algorithm and the proposed user pre-selection algorithms are 
evaluated for different BS receiver options via elaborate system level simulations that fully conform to the 
3GPP evaluation methodology. It is seen that the proposed LRT based MU scheduling algorithm along 
with an advanced BS receiver can yield over 27% improvement in cell average throughout along with 
over 10% cell edge throughput improvement compared to SU scheduling. Its sequential counterpart is also 
attractive in that it yields about 20% improvement in cell average throughput while retaining the cell edge 
performance of SU scheduling. Further, it is seen that user pre-selection is indeed an effective approach 
and the suggested pre-selection approaches can offer significant gains. 

A. Related Work 

Resource allocation for the OFDM/OFDMA networks has been the subject of intense research |[7l- |[T2l . A 
majority of OFDMA resource allocation problems hitherto considered belong to the class of single-user (SU) 
scheduling problems, which attempt to maximize a system utility by assigning non-overlapping subcarriers to 
users, along with transmit power levels for the assigned subcarriers. Even within this class most of the focus 
has been on the downlink. These resource allocation problems have been formulated as continuous optimization 
problems, which are in general non-linear and non-convex. As a result several approaches based on the game 
theory |[T3l . |[T4ll . dual decomposition or the analysis of optimality conditions |[T5l have been developed. 
Recent works have focused on the downlink in emerging cellular standards and have proposed approximation 
algorithms after modeling the resource allocation problems as constrained integer programs. Prominent examples 
are ifTOl . |[T6l which consider the design of downlink SU-MIMO schedulers for LTE cellular systems and derive 
constant factor approximation algorithms. 

Resource allocation for the DFT-Spread-OFDMA uplink has been relatively less studied with lH, ifTTl - lllll 
being the recent examples. In particular, i20l first considers a relaxed SU scheduling problem (without the 
integer valued RB allocation and the contiguity constraints) and poses the resource allocation problem as a 
convex optimization problem. It then proposes a fast interior point based method to solve that problem followed 
by a modification step to ensure contiguous allocation. A similar approach was adopted earlier in |[22l where the 
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formulated convex optimization problem was solved via a sub-gradient method followed by a modification step 
to ensure integer valued RB allocation. Furthermore, ||2TI explicitly enforced the integer valued RB allocation 
constraint while formulating the resource allocation problem but also assumed that the chunk size for each user is 
given as an input, and proposed message passing based algorithms. Message passing based algorithms were also 
applied in lITTI over an OFDMA uplink in order to minimize the total transmit power subject to rate guarantees. 
We note that while the algorithms in |[20l - |[22l may yield effective solutions in different regimes, they do not 
offer a worst-case performance guarantee and hence cannot be claimed to be approximation algorithms. 

On the other hand, IS, |[T7l - |[T9l have explicitly modeled both integer valued RB allocation and the contiguity 
constraints. Specifically, ifTTl shows that the SU LTE UL scheduling problem is APX-hard and both 161, ifTTl 
provide deterministic constant-factor approximation algorithms, whereas ifTSl provides a randomized constant- 
factor approximation algorithm. |[T9l extends the algorithms of lH, ifTTl to the SU-MIMO LTE-A scheduling. 
The algorithm proposed in Q is based on an innovative application of the LRT technique, which was developed 
earlier in [51. However, we emphasize that the algorithms in 161, ifTTl - lfTOll cannot incorporate MU scheduling, do 
not consider user pre-selection and also cannot incorporate knapsack constraints. To the best of our knowledge 
the design of approximation algorithms for MU scheduling in the LTE upUnk has not been considered before. 



Consider a single-cell with K users and one BS which is assumed to have Nr >1 receive antennas. Suppose 
that user k has A'^t > 1 transmit antennas and its power budget is Pk- We let N denote the total number of RBs. 

We consider the problem of scheduling users in the frequency domain in a given scheduling interval. Let 
Ofc) 1 < k < K denote the weight of the A:*'* user which is an input to the scheduling algorithm and is updated 
using the output of the scheduling algorithm in every scheduling interval, say according to the proportional 
fairness rule 1231 . Letting denote the rate assigned to the A;*^' user (in bits per N RBs), we consider the 
following weighted sum rate utility maximization problem. 



where the maximization is over the assignment of resources to the users subject to: 

• Decodability constraint: The rates assigned to the scheduled users should be decodable by the base-station 
receiver. Notice that unlike SU scheduling, MU scheduling allows for multiple users to be assigned the 
same RB. As a result the rate that can be achieved for user k need not be only a function of the resources 
assigned to the k^^ user but can also depend on the those assigned to the other users as well. 



IL MU Scheduling in the LTE UL 



max 




(1) 



l<k<K 
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• One transmit antenna and one power level per user: Each user can transmit using only one power 
amplifier due to cost constraints. Thus, only a basic precoding in the form of transmit antenna selection 
is possible. In addition, each scheduled user is allowed to transmit with only one power level (or power 
spectral density (PSD)) on all its assigned RBs. 

• At most one chunk per-user and at-most T users per RB: The set of RBs assigned to each scheduled 
user should form one chunk, where each chunk is a set of contiguous RBs. Further at-most T users can be 
co-scheduled on a given RB. T is expected to be small number typically two and no greater than four. 

• Complete overlap constraint: If any two users are assigned a common RB then those two users must be 
assigned the same set of RBs. Feasible RB allocation and co-scheduling of users in LTE MU UL is depicted 
in Fig [T] 

• Finite buffers and finite MCS: Users in a practical UL will have bursty traffic which necessitates considering 
finite buffers. In addition, only a finite set of MCS (29 possibilities in the LTE network) can be employed. 

• Control channel overhead constraints: Every user that is given an UL grant (i.e., is scheduled on at least 
one RB) must be informed about its assigned MCS and the set of RBs on which it must transmit along 
with possibly the transmit antenna it should employ. This information is sent on the DL control channel of 
limited capacity which imposes a limit on the set of users that can be scheduled. In particular, the scheduling 
information of a user is encoded and formatted into one packet (henceforth referred to as a control packet), 
where the size of the control packet must be selected from a predetermined set of sizes. A longer (shorter) 
control packet is used for a cell edge (cell interior) user. In the LTE/LTE-A systems each user is assigned 
one search region when it enters the cell. In each scheduling interval it then searches for the control packet 
(containing the scheduling decisions made for it) only in that region of the downlink control channel, as 
well as a region common to all users. 

• Per sub-band interference limit constraints: Inter-cell interference mitigation is performed by imposing 
interference limit constraints. In particular, on one or more subbands, the cell of interest must ensure that the 
total interference imposed by its scheduled users on a neighboring base-station is below a specified limit. 

We define the set C as the set containing N length vectors such that any c G C is binary-valued with ({0, 1}) 
elements and contains a contiguous sequence of ones with the remaining elements being zero. Here we say an 
RB i belongs to c (i € c) if c contains a one in its i*^ position, i.e., c{i) = 1. Note then that each c € C denotes 
a valid assignment of RBs since it contains one contiguous chunk of RBs. Also ci and C2 are said to intersect 
if there is some RB that belongs to both ci and C2. For any c € C, we will use Tail(c) (Head(c)) to return the 
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largest (smallest) index that contains a one in c. Thus, each c G C has ones in all positions Head(c), • ■ ■ , Tail(c) 
and zeros elsewhere. Further, we define {Qi, ■ ■ ■ ,Ql} to he a. partition of {!,••• ,K} with the understanding 
that all users that belong to a common set (or group) Qg, for any 1 < s < L, axe mutually incompatible. In other 
words at-most one user from each group Qg can be scheduled in a scheduling interval. Notice that by choosing 
L = K and Qg = {s}, 1 < s < K we obtain the case where all users are mutually compatible. Let us define a 
family of subsets, U, as 

U = {U^{1,--- ,K}:\U\<T h\Uf^gs\<iyi<s<L} (2) 

and let M = U X C. 

We can now pose the resource allocation problem as 

max p{U,c)X{U,c), s.t. 

{U,c)eM. 

For each group Qs, X{U,c) < 1; 

(U,c)eA4 

W:«nSs 

For each RB i, ^ X{U,c) < 1; 

(W,c)eA4 
c:iec 

P''{U,c)X{U,c) <1, I <q< J; 

(U,c)eM. 

a^U,c)X{U,c) <l, qel, (3) 

[U,c)eM. 

where (p denotes the empty set and X{U, c) is an indicator function that returns one if users in U are co-scheduled 
on the chunk indicated by c. Note that the first constraint ensures that at-most one user is scheduled from each 
group and that each scheduled user is assigned at-most one chunk. In addition this constraint also enforces 
the complete overlap constraint. The second constraint enforces non-overlap among the assigned chunks. Note 
that p{U, c) denotes the weighted sum-rate obtained upon co-scheduling the users in U on the chunk indicated 
by c. We emphasize that there is complete freedom with respect to the computation of p{U,c). Indeed, it can 
accommodate finite buffer and practical MCS constraints, account for any particular receiver employed by the 
base station and can also incorporate any rule to assign a transmit antenna and a power level to each user in 
hi over the chunk c. 

The first set of J knapsack constraints in where J is arbitrary but fixed, are generic knapsack constraints. 
Without loss of generality, we assume that the weight of the pair {U,c) in the q*^ knapsack, P'^{U,c), lies in 
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the interval [0, 1]. Notice that we can simply drop each vacuous constraint, i.e., each constraint q for which 
S(w c)gA4 /^"^(^j ^) — 1- "^^^ second set of knapsack constraints are column-sparse binary knapsack constraints. 
In particular, for each pair {U, c) G M. and q Z we have that a'^{U, c) G {0, 1}. Further, we have that for each 
{U,c) G Ad, Yliqex'^''^^^) — where A is arbitrary but fixed and denotes the column-sparsity level. Note 
that here the cardinality of I can scale polynomially in KN keeping A fixed. Together these two sets of knapsack 
constraints can enforce a variety of practical constraints, including the control channel and the interference limit 
constraints. For instance, defining a generic knapsack constraint as j3^{U, c) = V iU, c) G A4, for any given 
input K can enforce that no more that K can be scheduled in a given interval, which represents a coarse control 
channel constraint. In a similar vein, consider any given choice of a victim adjacent base-station and a sub-band 
with the constraint that the total interference caused to the victim BS by users scheduled in the cell of interest, 
over all the RBs in the subband, should be no greater than a specified upper bound. This constraint can readily 
modeled using a generic knapsack constraint where the weight of each pair {U,c) G M. is simply the ratio of 
the total interference caused by users in U to the victim BS over RBs that are in c as well as the specified 
subband, and the specified upper bound. The interference is computed using the transmission parameters (such 
as the power levels, transmit antennas etc) that yield the metric p(U,c). A finer modeling of the LTE control 
channel constraints is more involved since it needs to employ the column-sparse knapsack constraints together 
with the user incompatibility constraints and is deferred to Appendix ICl 

Note that for a given K, N, an instance of the problem in ^ consists of a finite set 1 of indices, a partition 
{Gi,--- ,Gl}, metrics {p{U,c)} V {U,c) G M and weights {l3i{U,c)}, \/ {U,c) e M,l < q < J and 
{a'^{U, c)}, V {U, c) G M., q £ I. Then, in order to solve ^ for a given instance, we first partition the set Ad 
into two parts as M = A<''^"°™ U M^''^^ where we define M''""''"^ = {(U,c) e M: p'^{U, c) < 1/2, V 1 < 
q<J} so that M"'"^"^ = Al\ We then define J sets, V(^\ • • • , V^-^^ that cover Al™''^'= (note that any 

two of these sets can mutually overlap) as {U, c) G V^'^^ iff (^'^{U, c) > 1/2 for g = 1, • • • , J. Recall that T, J are 
fixed and note that the cardinaUty of M, \M\, is 0{K^N^) and that M^''"°'" and {V^^^} can be determined 
in polynomial time. Next, we propose Algorithm I which possesses the optimality given below. The complexity 
of Algorithm I, which is essentially determined by that of its module Algorithm Ila, scales polynomially in KN 
(recall that T is a constant) A detailed discussion on the complexity along with steps to reduce it are deferred 
to the next section. We offer the following theorem which is proved in Appendix |Al 

Theorem 1. 77ie problem in is APX-hard, i.e., there is an e > such that it is NP hard to obtain a 1 — e 
approximation algorithm for Let W°^^ denote the optimal weighted sum rate obtained upon solving (12) and 
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let W denote the weighted sum rate obtained upon using Algorithm I. Then, we have that 

\ i+tTa+3J ' Otherwise 

An interesting observation that follows from the proof of Theorem [T] is that any optimal allocation over Ai^^'^^ 
can include at-most one pair from each V^''^ l<q<J. Then since the number of pairs in each V^'^'), l<q<J 
is 0{K'^ N"^), we can determine an optimal allocation yielding V[/opt>widc exhaustive enumeration with a high 
albeit polynomial complexity (recall that T and J are assumed to be fixed). Thus, by using exhaustive enumeration 
instead of Algorithm lib, we can claim the following result. 

Corollary 1. Let VK°p* denote the optimal weighted sum rate obtained upon solving (121) and let W denote the 
weighted sum rate obtained upon using Algorithm II albeit with exhaustive enumeration over Al^'<^'=. Then, we 
have that 

W°^* T-P A ^ wide i 



, If 



2+St2J' Otherwise 

Remark 1. Some intuition on the process in the heart of Algorithm I (which is Algorithm Ila) is on order. Note 
that Algorithm Ila has two stages. The first one (comprising of steps 1 through 16) begins by initializing an 
empty stack S and defining the current gain of each pair to be equal to its metric. Then, promising pairs are 
successively added to the top of the stack S. Each time a pair is pushed into the stack, the current gain of each 
pair that can potentially be added and which confiicts with the pair just added ( in terms of sharing a common 
RB or each having a user that belongs to an identical group or each having a unit weight in a common sparse 
knapsack constraint in X), is decremented by the current gain of the added pair The idea behind this operation 
is that eventually only one pair among these confiicting pairs can be selected, so by decrementing the gains we 
ensure that a confiicting pair can be added in a later step only if it has a larger gain. Similarly, the gain of a 
non confiicting pair is also decremented by its maximal weight times twice the current gain of the added pair, in 
order account for the non-sparse knapsack constraints. At the end of the first stage the stack S contains a set of 
promising pairs but the entire set need not be feasible for In the second stage another stack S' is formed by 
successively picking pairs from the top of stack S and adding them to S' if feasibility is satisfied. Note that the 
top down approach of picking pairs from S is intuitively better since pairs at the top will have larger metrics 
than pairs below with whom they confiict. 
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For notational simplicity, henceforth unless otherwise mentioned, we assume that all users are mutually 
compatible, i.e., L = K with Qs = {s}, 1 < s < K. 

III. Complexity Reduction 

In this section we present key techniques to significantly reduce the complexity of our proposed local ratio test 
based multi-user scheduling algorithm. As noted before the complexity of Algorithm I is dominated by that of its 
component Algorithm Ila. Accordingly, we focus our attention on Algorithm Ila and without loss of generality we 
assume that M. = J^^^^^°"" _ Notice that hitherto we have assumed that all the metrics {piU, c) : {U, c) € A4} 
are available. In practise, computing these 0{K'^N'^) metrics, which are often complicated non-linear functions, 
is the main bottleneck and indeed must be accounted for in the complexity analysis. Before proceeding, we make 
the following assumption that is satisfied by all physically meaningful metrics. 

Assumption 1. Sub-additivity.- We assume that for any {U,c) S Al 

piU, c) < p{Ui, c) + p{U2,c), ^ UiM2-U = Ui\JU2. (6) 

The following features can then be exploited for a significant reduction in complexity. 

• On demand metric computation: Notice in Algorithm Ila that the metric for any {U, c) G Al, where 
Tail(c) = j for some j = 1, • • • ^N, needs to be computed only at the j*'^ iteration at which point we need 
to determine 

p'{U,c)=p{U,c)-T'^^\U,c), (7) 
where the offset factor r(-')(Z/^,c) is given by 

r(^)(Z^,c)= fp(Z^4,C)i:((Z^,c),(Z^4,<)) + 2p(Z^4,C) max{^^ 

^ — ' \ l<g<J 

and where p{U^^ ^ta) equal to the p'{U*^, c* J computed for the pair selected at the m^^ iteration with m < 
j - 1 and £i{U,c), {U^,c*^)) denotes an indicator (with £%{U,c), {U^,c*^)) = 1 - £{iU,c), {U^,c*^))) 
which is true when ZY*, n / or c n cj;^ / (/> or 3 g G T : a'^{U:^,c*^) = a'^{U, c) = 1. Further note that 
p'{U,c) in d?]) is required only if it is strictly positive. Then, an important observation is that if at the j*'^ 
iteration, we have already computed p{Ui,c) and p{U2,c) for some Ui,U2 ■.U = Ui UU2, then invoking the 
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sub-additivity property we have that 

p'{U,c) <p{Ui,c)+p{U2,c)-T^^\u,c), (8) 

so that if the RHS in ([8]l is not strictly positive or if it is less than the greatest value of p'{U\ c') computed 
in the current iteration for some other pair {W ,c') : Tail(c') = j, then we do not need to compute p'{U, c) 
and hence the metric p(p(, c). 
• Selective update Note that in the j*'^ iteration, once the best pair {U*,c*) is selected and it is determined 
that p'{U*,c*) > 0, we need to update the metrics for pairs {U' , c') : Tail(c') > j + 1, since only such pairs 
will be considered in future iterations. Thus, the offset factors {T^^^U' ,c')} need to be updated only for 
such pairs, via 

r(^+'\u\c') = r^^\u',c')+p\u;,c*)£{{u',c'),{u;,c*)) + 2p\u;,c*) m^^^^ 

Further, if by exploiting sub-additivity we can deduce that p'{W,c') < for any such pair, then we can 
drop such a pair along with its offset factor from future consideration. 

IV. Improving Performance via a second phase 

A potential drawback of the LRT based algorithm is that some RBs may remain un-utilized, i.e., they may 
not be assigned to any user. Notice that when the final stack S' is built in the while-loop of Algorithm Ila, an 
allocation or pair from the top of stack S is added to stack S' only if it does not violate feasibility when considered 
together with those already in stack S'. Often multiple pairs from S are dropped due to such feasibility violations, 
resulting in spectral holes formed by unassigned RBs. To mitigate this problem, we perform a second phase. The 
second phase consists of running Algorithm Ila again albeit with modified metrics {p{U, c) : (U, c) € ^yy^narrowj 
which are obtained via the following steps. 

1) Initialize p{U, c) = p(U, c), V (U, c) € ^1°''"°'^. Let S' be obtained as the output of Algorithm Ila when 
it is implemented first. 

2) For each (U, c) € S', we ensure that any user in U is not scheduled by phase two in any other user set 
save U, by setting 

p{u', c') = oifu' ku'nu^(j),y {u', c') g (9) 
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3) For each {U, c) G S' , we ensure that no other user set save U is assigned any RB in c, by setting 

p{U', c') = if ^ & c' n c / 0, V {U\ c) G (10) 

4) For each iU, c) G S' , we ensure that the allocation (W, c) is either unchanged by phase two or is expanded, 
by setting 



piU,c'), If Tail(c') > Tail(c) & Head(c') < Head(c) 
0, Otherwise 



A consequence of using the modified metrics is that the second phase has a significantly less complexity since a 
large fraction of the allocations are disallowed (since many of the modified metrics are zero). While the second 
phase does not offer any improvement in the approximation factor, simulation results presented in the sequel 
reveal that it offers a good performance improvement with very low complexity addition. 

V. Simulation Results: Single cell Setup 

In this section we evaluate key features of our proposed algorithm over an idealized single-cell setup. In 
particular, we simulate an uplink wherein the BS is equipped with four receive antennas. The system has 280 
sub-carriers divided into 20 RBs (of size 14 sub-carriers each) available as data subcarriers that are used for 
serving the users. We assume 10 active users all of whom have identical maximum transmit powers. We model 
the fading channel between each user and the BS as a six-path equal gain i.i.d. Rayleigh fading channel. In all 
the results given below we assume an infinitely backlogged traffic model. For simplicity, we assume that there 
are no knapsack constraints and that at-most two users can be co-scheduled on an RB (i.e., J = 0, A = and 
T = 2). Further, each user can employ ideal Gaussian codes and upon being scheduled, divides its maximum 
transmit power equally among its assigned RBs. Notice that since Ad = J\yl^^^^^°'" we can directly use Algorithm 
Ila. 

In Fig. m we plot the average cell spectral efficiency (in bits-per-sec-per-Hz) versus the average transmit SNR 
(dB) for an uplink where each user has one transmit antenna and the BS employs the linear MMSE receiver. 
We plot the spectral efficiencies achieved when Algorithm Ila is employed with and without the second phase 
(described in Section HV]), respectively (denoted in the legend by MU-MMSE-LRT-2Step and MU-MMSE-LRT- 
IStep). Also plotted is the upper bound obtained by the linear programming (LP) relaxation of Q along with 
the spectral efficiency obtained upon rounding the LP solution to ensure feasibihty (denoted in the legend by 
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MU-MMSE-LP-UB and MU-MMSE-LP-Rounding, respectively). In Fig. |3l we plot the average cell spectral 
efficiency versus the average transmit SNR for an uplink where each user has one transmit antenna and the 
BS employs the successive interference cancelation (SIC) receiver. We plot the spectral efficiencies achieved 
when Algorithm Ila is employed with and without the second phase, respectively (denoted in the legend by 
MU-SIC-LRT-2Step and MU-SIC-LRT-lStep). Also plotted are the corresponding LP upper bound along with 
the spectral efficiency obtained upon rounding the LP solution. Figure |4] and Figure [5] are the counterparts of 
Figure |2] and Figure [3l respectively, but where each user has two transmit antennas and the BS can thus exploit 
transmit antenna selection. Finally, in Fig. [6] we plot the normalized spectral efficiencies obtained by dividing 
each spectral efficiency by the one yielded by Algorithm Ila when only single user (SU) scheduling is allowed, 
which in turn can be emulated by setting all metrics p{h{, c) : {U, c) € Ad in Q to be zero whenever \U\ > 2 3 
In all considered schemes we assume that Algorithm Ila with the second phase is employed. From Figures |2] to 
|6l we have the following observations: 

• For both SIC and MMSE receivers, the performance of Algorithm Ila is more than 80% of the respective LP 
upper bounds, which is much superior to the worst case guarantee 1/3 (obtained by specializing the result 
in (111) by setting = <f>,T = 2 and A = J = 0). Further, for both the receivers the performance of 
Algorithm Ila with the second phase is more than 90% the respective LP upper bounds. The same conclusions 
can be drawn when antenna selection is also exploited by the BS. In all cases, the performance of LP plus 
rounding scheme is exceptional and within 2% of the respective upper bound. However the complexity of 
this LP seems unaffordable as yet for practical implementation^ 

• The SIC receiver results in a small gain (1.5% to 2.5%) over the MMSE receiver. This gain will increase 
if we consider more correlated fading over which the limitation of linear receivers is exposed and as the 
maximum number of users that can be co-scheduled on an RB (T) is increased since the SIC allows for 
improved system rates via co-scheduling a larger number of users on an RB, whereas the MMSE will 
become interference limited. Note that antenna selection seems to provide a much larger gain (6% to 8%) 
that the one offered by the advanced SIC receiver. This observation must be tempered by the facts that 
the simulated scenario of independent (uncorrelated) fading is favorable for antenna selection and that the 
antenna switching loss (about 0.5 dB in practical devices) as well as the additional pilot overhead have been 
neglected. 

'Note that for SU scheduling MMSE and SIC receivers are equivalent. 

"For instance, this LP involves about 11, 500 variables and must be solved within each scheduling interval whose duration in LTE 
systems is one millisecond. 
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• MU scheduling offers substantial gains over SU scheduling (ranging from 50% to 75% for the considered 
SNRs). This follows since the degrees of freedom available here for MU scheduling is twice that of SU- 
scheduling. 

Next, in Fig. |7] we plot the normalized complexities for the scheduling schemes considered in Figures |2] to 
[51 Here the complexity of a scheduling scheme is determined by the complexity of the metric computations 
made by it. In all cases the second phase is performed for Algorithm Ila and more importantly the sub-additivity 
property together with the on-demand metric computation feature are exploited, as described in Section Hill to 
avoid redundant metric computations. All schemes compute the metrics {p{U,c) : {U,c) € Ai & \U\ = 1} and 
each such metric is deemed to have unit complexity when each user has one transmit antenna and a complexity 
of two units when each user two transmit antennas. On the other hand, for each evaluated p{U, c) : \U\ = 2, the 
complexity is taken to be two units when each user has one transmit antenna and the BS employs the MMSE 
receiver and one unit when each user has one transmit antenna and the BS employs the SIC receiver. The latter 
stems from the fact that with the SIC receiver, one of the users sees an interference free channel. Thus, its 
contribution to the metric is equal to the already computed single-user metric determined for the allocation when 
that user is scheduled alone on the corresponding chunk. Similarly, for each evaluated p{U,c) : \U\ = 2, the 
complexity is taken to be eight units when each user has two transmit antennas and the BS employs transmit 
antenna selection together with the MMSE receiver and four units for the case when the BS employs transmit 
antenna selection together with the SIC receiver. Note that MMSE-Total, SIC-Total denote the complexities 
obtained by counting the corresponding complexities for all pairs {U, c) G Ad, respectively, whereas MMSE- 
AS-Total, SIC-AS-Total denote the total complexities obtained when antenna selection is employed by the BS 
together with the MMSE receiver and the SIC receiver, respectively. Note that all complexities in Fig. |7l are 
normalized by MMSE-AS-Total. The key takeaway from Fig. |71 is that exploiting sub-additivity together with 
the on-demand metric computation can result in very significant complexity reduction. In particular, as per our 
definition of complexity, more than 80% reduction can be obtained for the MMSE receiver and more than 75% 
reduction can be obtained for the SIC receiver, with the respective gains being larger when antenna selection 
is also exploited. Further, we note that considering Algorithm Ila, the second phase itself adds a very small 
complexity overhead but results in a large performance improvement. To illustrate this, for the MMSE receiver 
the complexity overhead ranges from 2 to 4%, whereas the performance improvement ranges from 9 to 13%, 
respectively. Finally, in Fig. [H we conduct a complexity comparison identical to that in Fig. |2l except that the 
complexity computed for each p{U, c) is now also multiplied by the size of the chunk indicated by c. Notice 
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that the complexity reductions achieved by exploiting sub-additivity property together with the on-demand metric 
computation feature are now even larger. 

VI. Sequential LRT based MU Scheduling 
We next propose a sequential LRT based MU scheduling method that yields a scheduling decision over j\4narrow 
. As before, our focus is on avoiding as many metric computations as possible. The idea is to implement the LRT 
based MU scheduling algorithm in T iterations, where we recall T denotes the maximum number of users that 
can be co-scheduled on an RB. In particular, in the first iteration we define metrics p{U, c) = piU, c), V iU, c) G 
^^narrow . |^| _ -[^ v^it\\ p(U , c) = Otherwise, and use these metrics in Algorithm Ila to obtain a tentative 
scheduling decision. Further, in the s^^ iteration where 2<s<T — 1, we first perform the following steps to 
obtain metrics p{U,c), V {h{,c) G _^narrow^ where only a few of these metrics are positive, and then use them 
in Algorithm Ila to obtain a tentative decision. 

• Initialize p{U, c) = 0, V {U, c) G _^narrow ^/ (jgj^Qtg t^e output obtained from the previous iteration. 

• For each {U,c) G S' we ensure that any user in set U can be scheduled in the s^^ iteration only as part of 
a set that contains all users in U along with at-most one additional user, by setting 

p{U' ,c') = if {U <^U' kU' nU ^ (t>) oi{\U'\ > \U\ + 1 kU' nU ^ (t>), y{U',c') G Al''^™". (11) 

• For each {U, c) G S', we also ensure that any user in set U must be assigned all RBs in c, by considering 
each {U',c') G M'^^™" :U CU' k \U'\ < \U\ + 1, and setting 



p{U',c')={ 



p{U',c'), If Tail(c') > Tail(c) & Head(c') < Head(c) 

(12) 

0, Otherwise 

(13) 



In the last iteration, i.e. when s = T, we initialize p{U,c) = p{U,c), V (W,c) G .A^"*^""™™. Then, using the set 
S' obtained as the output of the (T — 1)*'^ iteration, we perform the two aforementioned steps. Additionally, to 
ensure non-overlapping chunk allocation, for each {U, c) G S' we set 

p(u', c') = if c' n c / & n = 0, V (u', c') g (14) 

Note that the different initialization chosen for the last iteration seeks to select a larger pool of positive metrics 
and can improve performance albeit at an increased complexity. In addition, after each iteration we also enforce 
an improvement condition which checks if the weighted sum rate yielded by the obtained decision is strictly 
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greater than that computed at the end of the previous iteration. If this condition is satisfied, we proceed to the 
next iteration, else the process is terminated and the solution obtained at the end of the previous iteration is 
returned. Notice that in each iteration only a small subset out of the set of all metrics is selected, which in 
particular is that whose corresponding pairs are compatible (as defined in the aforementioned conditions) with 
the output tentative scheduling decision of the previous iteration. Next, we offer an approximation result for the 
sequential LRT based MU scheduling that holds under mild assumptions. 

Assumption 2. Suppose T is any allocation {{U,c)} that is feasible for (13). Then T is downward closed in the 
following sense. Any allocation T' constructed as T' = {(ZY', c) -.W ^lA {hi, c) € J-} is also feasible. 

Proposition 1. Suppose that Assumptions\J}and^are satisfied. Let the weighted sum rate yielded by the sequential 
LRT based MU scheduling over be denoted by l^^cq-narrow j/^^^^ 

yJ/opt, narrow 

wseq-narrow ^ /i 

- r(2 + A + 2 J) ■ ^ ^ 

Proof: Let j:-opt'narrow optimal allocation of pairs from ^w'^'^"^"^ ihat yields a weighted sum 

rate l^^opt-narrow initiaUzc P = <f>. Then for each {U,c) € J-opt,narrow determine the best user 
u = argm.axu£u{p{u,c)} and insert the pair {u,c) into T'. Note that due the sub-additivity property in 
Assumption [H we must have that p{u,c) > ^'"^^ . Consequently, we have that the weighted sum rate yielded 
by T' is at-least vy°p*^""°" _ Furthermore, on account of Assumption |2j T' is a feasible allocation for ([3]l. Then, 
suppose J'(^) is the allocation obtained after the first iteration of the sequential algorithm. Since this allocation 
is a result of applying Algorithm Ila with single user metrics, upon invoking Theorem 1 we can claim that the 
weighted sum rate yielded by J"^^) is at-least a fraction 2+A+2J ^^^^ single-user allocation, where a single- 

user allocation is one where each pair includes only one user. Then, since T' is one such single-user allocation 
we can claim that the weighted sum rate yielded by J'(^) is at-least Finally, since the improvement 

condition ensures that the weighted sum rates yielded by tentative allocations across iterations are monotonically 
increasing , we can deduce that the proposition is true. 

VII. User Pre-Selection 

In a practical cellular system the number of active users can be large. Indeed the control channel constraints 
may limit the BS to serve a much smaller subset of users. It thus makes sense from a complexity stand-point to 
pre-select a pool of good users and then use the MU scheduUng algorithm on the selected pool of users. Here 
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we propose a few user pre-selection algorithms. For convenience, wherever needed, we assume that at-most two 
users can be co-scheduled on an RB (i.e., T = 2) which happens to be the most typical value. 

Before proceeding we need to define some terms that will be required later. Suppose that each user has one 
transmit antenna and let h„ j denote the effective channel vector seen at the BS from user u on RB j, where 
1 < u < K and 1 < j < A^. Note that the effective channel vector includes the fading as well as the path loss 
factor and a transmit power value. Then, letting Wu denote the PF weight of user u, we define the following 
metrics: 

• Consider first the weighted rate that the system can obtain when it schedules user u alone on RB j, 

p'^'inj) = Wu log(l + hl.K,j), yi<u<Kkl<j<N. (16) 

• Let U = {u, v} : u ^ V he any pair of users and suppose that the BS employs the MMSE receiver. Then, 
the weighted sum rate obtained by scheduling the user pair U on RB j is given by 

p~(UJ) = ^„log(l + hl^{I + K,,hl.r'K,j) + log(l + h|;_ .(I + h„,,ht_^.)"'h.,,). (17) 

• Finally, assume that the BS employs the SIC receiver and let u = arg maxsgz^{tt;s} and lei v = U\u. Then, 
the weighted sum rate obtained by scheduUng the user pair U on RB j is given by 

=p^"(ti, j) + log(l + h|_^.(I + h^,,hip-ih,,,). (18) 

We are now ready to offer our user pre-selection rules where a pool of K users must be selected from the K 
active users. Notice that to reduce complexity, all rules neglect the contiguity and the complete overlap constraints. 

1) The first rule simply selects the K users that offer the K largest single-user rates among 

{Ef=iP^n^,i)}f=i- 

2) The second rule assumes that each RB can be assigned to at-most one user. Then, if a user subset A C 
{1, • • • ,K} is selected, the system weighted sum-rate is given by 

N 

f{A) = Y,^^Mf''{u,j)}. (19) 

It can be shown that / : 2^^' ' '^^ — )• IR+ is a monotonic sub-modular set function |[T6l . As a result, the 
user pre-selection problem 

arg max {f{A)} (20) 

AC{1, - ,K}:\A\<k 
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can be sub-optimally solved by adapting a simple greedy algorithm li24l . which offers a half approximation 

m. 

3) The third rule assumes that each RB can be assigned to at-most two users and that the BS employs the 
MMSE receiver. Then, if a user subset ^ C {1, • • • , K} is selected, the system weighted sum-rate is given 
by 

g{A) ^ J]max |max{p-(n, j)}, ^ma^^Jp~{U,j)}\ . (21) 

It can be shown that g : 2^^'"' —?■ 1R+ is a monotonic set function but unfortunately it need not be 
sub-modular. Nevertheless, we proceed to employ the greedy algorithm to sub-optimally solve 

arg max {g{A)} (22) 

^C{l,--,X}:|^|<i^ 

4) The fourth rule also assumes that each RB can be assigned to at-most two users but that the BS employs the 
SIC receiver. However, even upon replacing p™"^^^{U,j) in (|2TI ) with p^^'^{U,j), the resulting set function 
need not be sub-modular. As a result we use a different metric. In particular, for a user subset A C 
{1, • • • , K} we employ a metric that is given by 

TV 

h{A) = J] n n ^1 = 1) + p'''{U,j)X{\U nA\ = 2)) 



7 = 1 M=(u,iO:"<" 

^..„e{l.--,J^} 



N 



j = l \ U&A W = (u,t.):u<i, 



(23) 



Notice that for any A, h{A) represents the system weighted sum-rate when time-sharing is employed by the 
system wherein in each slot only a particular user or two distinct users from a particular pair in {1, • • • , K} 
are allowed to be scheduled. Then, a key result which is proved in Appendix |Bj is the following. 
Proposition 2. The set function h{.) defined in f l23D is a monotonic sub-modular set function. Thus the 
problem 

arg max {K^)} (24) 

AQ{1,- ,K}:\A\<k 

can be solved sub-optimally (with a 1/2 approximation) by a simple greedy algorithm. 
As a benchmark to compare the performance of the proposed user pre-selection algorithms we can consider 
the case where LRT MU scheduling is employed without user pre-selection but where an additional knapsack 
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constraint is used to enforce the limit on the number of users that can be scheduled in an interval. It can be 
verified that this can be achieved by defining a knapsack constraint in ^ as f3^{U,c) = V {U,c) G Ad. 

VIII. System Level Simulation Results 

We now present the performance of our MU scheduling algorithms via detailed system level simulations. The 
simulation parameters conform to those used in 3GPP LTE evaluations and are given in Table |lVl In all cases 
inter-cell interference suppression (IRC) is employed by each base-station (BS). 

We first consider the case when each cell (or sector) has an average of 10 users and where there are no 
knapsack constraints. In Table |V] we report the cell average and cell edge spectral efficiencies. The percentage 
gains shown for the MU scheduling schemes are over the baseline LRT based single-user scheduling scheme. 
Note that for the first three scheduling schemes we employed the second phase described in Section |lVl Also, we 
observed that the LRT based SU scheduling together with the second phase yields at-least as good a performance 
(for both cell-edge and cell average throughputs) as those of the deterministic SU scheduling algorithms in ifTTl . 
|[T8l . so we have omitted results for the latter algorithms. As seen from Table |Vl MU scheduling in conjunction 
with an advanced SIC receiver at the BS can result in very significant gains in terms of cell average throughout 
(about 27%) along with good cell edge gains. For the simpler MMSE receiver, we see significant cell average 
throughout gains (about 18%) but a degraded cell edge performance. We note that it is possible to tradeoff a small 
fraction of the cell edge gains for a large cell edge performance improvement by altering the PF rule. Finally, 
the last two reported schemes are based on the sequential-LRT method described in Section [Vll We notice that 
sequential-LRT based scheduling provides significant cell average gains while retaining the cell edge performance 
of SU scheduling. Thus, the sequential LRT based scheduling method is an attractive way to tradeoff some cell 
average throughput gains for a reduction in complexity. 

Next, in Tables |Vl] and IVIII we consider LRT based MU scheduling, with the second phase described in 
Section |IVl for the case when the BS employs the MMSE receiver and the case when it employs the SIC 
receiver, respectively. In each case we assume that an average of 15 users are present in each cell and at-most 7 
first-transmission users can be scheduled in each interval. Thus, a limit on the number of scheduled users might 
have to be enforced in each scheduling interval. As a benchmark, we enforce this constraint (if it is required) 
using one knapsack constraint as described in Section IVIII Note that upon specializing the result in Theorem 1 
(with M™''^'' = (/), T = 2 and A = 0, J = 1)) we see that the LRT based MU scheduling algorithm guarantees 
an approximation factor of 1/5. Then, we examine the scenario where a pool of = 7 users is pre-selected 
whenever the number of first-transmission users is larger than 7. The LRT based MU scheduling algorithm is then 
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employed on this pool without any constraints. In Table |Vl] we have used the first second and third pre-selection 
rules from Section IVlIl whereas in Table lVIll we have used the first second and fourth pre-selection rules. It is seen 
that the simple rule one provides a superior performance compared to the benchmark. Indeed, it is attractive since 
it involves computation of only single user metrics. The other rule (rule 2) which possess this feature, however 
provides much less improvement mainly because it is much more aligned to single user scheduling. Rules 3 and 
4 involve computation of metrics that involve user-pairing and hence incur higher complexity. For the MMSE 
receiver, the gain of rule 3 over rule 1 is marginal mainly because the metric in rule 3 is not sub-modular and 
hence cannot be well optimized by the simple greedy rule. On the other hand, considering the MMSE receiver, 
the gain of rule 4 over rule 1 is larger because the metric used in rule 4 is indeed sub-modular and hence can 
be well optimized by the simple greedy rule. 

IX. Conclusions 

We considered resource allocation in the 3GPP LTE cellular uplink which allows for transmit antenna selection 
for each scheduled user as well as multi-user scheduling, wherein multiple users can be assigned the same time- 
frequency resource. We showed that the resulting resource allocation problem, which must comply with several 
practical constraints, is NP-hard. We then proposed constant-factor polynomial-time approximation algorithms 
and demonstrated their performance via simulations. 
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Appendix A 
PROOF OF Theorem 1 

Let us specialize (O to instances where all the knapsack constraints are vacuous, where L = K and Qs = 
{s}, 1 < s < K and where p{U,c) = whenever \U\ > 2 for all {U,c) G Ad. Then ^ reduces to the SU 
scheduling problem considered in ||6l, lITTl which was shown there to be APX-hard. Consequently, we can assert 
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Table I 

Algorithm I: Algorithm for LTE UL MU-MIMO 
1: Input p{U, c), V {U, c)eM and A<"^"°", M™^'^^ 

2: Determine a feasible allocation over using Algorithm Ila and let ^^^a-rrow ^q^q^q corresponding 

weighted sum rate. 

3: Determine a feasible allocation over A4""^'^ using Algorithm lib and let l^'"^'^'^ denote the coiTcsponding 
weighted sum rate. 

4: Select and output the allocation resulting in 1^ = max{M^°^"°", W^'^'^''}. 

Table II 

Algorithm Ila: LRT based module M.'^''"°^ 

1: Initialize p'{U, c) ^ p{U, c), V {U, c) € A<^^"°™, stack S = (f) 
2: For j = 1, • • • , 

3: Determine {U*,c*) = argmaxcM.cjeA^"""" p'{U,c) 

Tail(c)=j 

4: If p'(Z^*,c*) > Then 

5: Set p = p'{U*,c*) and Push {U*,c*) into 5. 

6: For each {U, c) G TU""^™™ such that p'{U, c) > 

7: U 3 Qs -.UnGs 7^ cl)SzU* HQs 7^ (t> OT c*nc^ cl) Then 

8: Update (iY, c) ^ c) - p 

9: Else If 3 g G X : a'?(Z^, c) = ai{U*, c*) = 1 Then 

10: Update p'{U, c) ^ p'(U, c) - p 

11: Else 

12: Update p'(U, c) ^ c) - 2]5maxi<g<j f3'^{U, c). 

13: End If 

14: End For 

15: End If 

16: End For 

17: Set stack S' = cj) 

18: WhUe 5/0 

19: Obtain {U, c) = Pop S 

20: If (ZY, c) U 5' is vaUd Then %% {U, c)US' is deemed valid if no user in lA is incompatible with any user present 

in S' and no chunk in S' has an overlap with c and all knapsack constraints are satisfied by (U, c) U S' . 

21: Update 5' ^ {U,c)uS' 

22: End While 

23: Output S' and W''^""^ = E(w c)g5' ^'(^' c). 



that © is APX-hard. 

Next, consider first Algorithm Ila which outputs a feasible allocation over ,^4'^^'^'^°" yielding a weighted sum 
rate l^narrow j^gj ^opt,narrow (je^otc the Optimal weighted sum rate obtained by solving (O albeit where all 
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Table III 

Algorithm lib: Greedy module over M. 



wide 



1: Input p(U,c), V {U,c) G and {V^'^j^^i- 

2: Set cS = and M' = Al"''^°. 

3: Repeat 

4: Determine (ZY* , c* ) = arg max (w,c)eA/t' p{U,c). 



5: Update 5 ^ 5 U (^*, c*) and M' = M' \ {V'-"^ :{U*,c*)£ V^'^^ 

6: Until {U*,c*) = or M' = (p. 

7: Output S and tl^-'^'^ = E(w c)e5P(^> c). 



pairs {U, c) are restricted to lie in Ai 



narrow 



. We will prove that 



narrow 



pl^opt, narrow 
> 



- i + r + A + 2j' 



(25) 



We present a proof that invokes notation and results developed for LRT based SU scheduUng in |[6l as much 
as possible, and highlights mainly the key differences. These differences are novel and crucial since they allow 
us to co-schedule multiple users on a chunk while respecting incompatibility constraints and to satisfy multiple 
knapsack constraints. 

Note that Algorithm Ila builds up the stack 5 in steps. In particular let 5j, j = 1, • • • , be the element 
that is added in the j^^ step and note that either Sj = or it is equal to some pair {U*, Cj). As in Q, we use 



two functions p\^' : A^"''""^ K+ and p^^' : ^ K+ for j = 0, • • • , A^ to track the function p'{, ) 

as the stack S is being built up over A^ steps and in particular we set p^\l{,c) = 0, V {U,c) £ ^narrow ^j^^ 
P2°^ {U,c) = p{U, c) , y {U,c) € ;V/inarrow_ p^j. ^yj. problem at hand, we define {p[^^ {U,c), p''^^ {U,c)} recursively 



as 




{pt'\u;,c';))+A:(^pt'\u,c)>o) , if c*nc/0 
{p'i~^\u*, c*))+x (pi~^\u, c) > o) , Eiseif 3 -.u r\gs ^ <i) ku* nQs ^ 

{pt'\u*,c*))+A: (p^t^\u,c) > O) , Elself 3 q e I : a«(Z^,c) = a''{U*,c*) = 1 
2{p^2^-^\u*,c*))+X (pi~^\u, c) > o) maxi<g<j pi{U, c), Otherwise 



p'i\u,c)=p'i-^\u,c)-p'f\u,c), 



(26) 



where (x) 



= max{x,0}, X G IR, X{.) denotes the indicator function and {U*,c*) = 
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arg max(M,c)eA^>»i"ow -p)^ '{U,c). Hence, we have that 

Tail(c)=j 



p'r'\U,c)=p'^\U,c) +pY'iU,c), V iU,c) G M~, j = !,■■■ ,N. (27) 
It can be noted that 

p^i\u,c) < 0, V {U,c) € Al™™ : Tail(c) < j 
pf'^ (U,c)< p^^\u, c) , V (Z^, c) € M"''""™ & > j. (28) 

Further, to track the stack 5' which is built in the while loop of the algorithm, we define stacks {5*}j^q 
where S"^ = (p and S* is the value of S' after the Algorithm has tried to add U^_j_^iSm to 5' (starting from 
S' = (j)) so that Sq is the stack S' that is the output of the Algorithm. Note that S*_^_^ C S* C S*_^^ U Sj+i. 
Next, for J = 0, • • • , N, we let W^^) °p* denote the optimal solution to (|3]) but where is replaced by 
_yy^narrow jj^^ function p{,) is replaced by P2\,). Further, let W'-^^ = Yj{u,c)eS' P2\^^^) ^nd note that 

^opt.narrow ^ ^(0) opt ^^narrow ^ ^(0) ^^^^ induction that 

opt < (T + 1 + A + 2J)W^^\ V i = iV, • • • , 0, (29) 

which includes the claim in ^ at j = 0. The base case l^(^) °p* < (T + 1 + A + 2 J)H^(^) is readily true 
since S*j^ = (j) and p^^^ {U, c) < 0, V {U, c) € Al°''"°*. Assume that ^ holds for some j. We focus only on 
the main case in which Sj = {U*,c*-) ^ (j) (the remaining case holds trivially true). Note that since {l{j,c*) is 
added to the stack S in the algorithm, p2~^\Uj,c*) > 0. Then from the update formulas (l26t . we must have 
that P2\u*,c*) = 0. Using the fact that <S*_^ C S* U {U*,c*) together with the induction hypothesis, we can 



conclude that 



(W,c)e5* (W,c)G5*_i 



Next, we will show that 

Y P?{U,c)>pt'\u;,c*). (31) 
(w,c)G5;_i 

Towards this end, suppose that 3*^^ = S* U {U*,c*). Then, recalling (l26l) we can deduce that (|3TI ) is true 
since p^^\l/(*,c*) = P2 ^\u*,c*). Suppose now that S*_i = S*. In this case we can have two possibilities. 
In the first one {U*,c*) cannot not be added to S* due to the presence of a pair {U\c') € S* for which 
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at-least one of these three conditions are satisfied: 3 Gs W D Qs ^ <j) h U"^ f) Qs ^ <j); c' [~\ c*- ^ (j) and 
3 g G Z : a'i{U\ c') = a'^iU*, c*)-l . Since any pair {U' , c') G S* was added to S in the algorithm after the 
step, from the second inequaUty in (1281 ) we must have that P2 , c') > 0. Recalling (l26l ) we can then deduce 

that p[^\u',c') = p^i~^\u*,c*) which proves (|3TI ). In the second possibility, {U*,c*) cannot not be added to 
<S* due to a generic knapsack constraint being violated. In other words, for some q £ {!,■ ■ ■ , J}, we have that 



f^'{U,c)>l-f^'^{U*,c*). (32) 

(W,c)65- 



Since {U*,c*) G M'"'"°'", /3''(Z^;,c*) < 1/2 so that 



2 max /3«(Z^,c) > 2 ^ p'^{U,c)>l. (33) 

(W,c)e5- ^-^-"^ (W,c)G5- 



which along with (1261 ) also proves (I3TI ). Thus, we have estabUshed the claim in (Bill . 

Finally, letting V^^^ °p* denote the optimal solution to ([3]l but where Al is replaced by ^^^larrow ^j^^ 
function p(, ) is replaced by Pi\,), we will show that 

/■ 1^ T/{i) opt 

p^^"^^(Z^*,c*) > — — -. (34) 

^2 ^ J' - r+l + A + 2J ^ ^ 

Towards this end, from ^ we note that for any pair {U, c) G c) < p^i~^\u* ,c*). Let "^"^ 

be an optimal allocation of pairs that results in V^^^ °p*. For any two pairs {Ui, ci), {U2, C2) G V^^ we must 
have that for each 1 < s < L, at-least one of Ui n Qs and H t/s is (j), as well as ci n C2 = (j). In addition, 
|Z//i| and 1^/2! are no greater than T. Thus we can have at-most T such pairs {{U,c)} in vj'^^ for which 
3Qs: Ur\Qs^4>^U*r\Qsi^4'- Further, using the first inequality in (|28] ) we see that any pair iU, c) for 
which c n c* / (/> and p[^\u,c) = p^^~^\u*,c*) must have Tail(c) > j so that j G c. Thus, Vp^°'^* can 
include at-most one pair {hi, c) for which c n c* 7^ 0. Next, there can be at-most A constraints in Z for which 
a'^{Uj,Cj) = 1,(7 G X is satisfied. For each such constraint q G I we can pick at-most one pair {U,c) for 
which a'^ipl^c) = 1 and p^^\u,c) = P2 ^\u*,c*). Thus, can include at-most A such pairs, one for 

each constraint. Now the remaining pairs in V^^ "''^ (whose users do not intersect U* and whose chunks do not 
intersect c* and which do not violate any binary knapsack constraint in the presence of {U*,c*)) must satisfy 
the generic knapsack constraints. Let these pairs form the set V^^ so that 

J 
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<2Jpt'\u*,c*). 



Combining these observations we have that 



yO) = Pii^^ c) < (1 + r + A + 2J)pt'\li;,c*), (35) 



which is the desired result in (I341 ). 

Thus, using (l30b . (OTI ) and (l34l ) we can conclude that 

(1 + T + A + 2J) J]] c) + p^J\u, c) > °P* + W^^^ °P* > W^(^-i) °P*. (36) 

which proves the induction step and proves the claim in (|25] ). 

Let us now consider the remaining part which arises when A^""^'^ ^ (p. Consider first Algorithm lib which 
outputs a feasible allocation over M"''^^ yielding a weighted sum rate W""'^". Let H^°pt.^ide ^g^^^g optimal 
weighted sum rate obtained by solving ^ albeit where all pairs {U, c) are restricted to lie in A^""^*^*^. We will 
prove that 

Ti/opt,wide 

^wide > _ ^ (3y^ 

Let V°P*''^''^*' be an optimal allocation of pairs from M'"^'^'' that results in a weighted sum rate ^^opt.widc Qgarly, 
in order to meet the knapsack constraints, yopt-widc ^^j^ include at-most one pair from each V^'^\ 1 < q < J so 
that there can be at-most J pairs in yopt-wic ^Yi^^^ \yy selecting the pair yielding the maximum weighted sum-rate 
we can achieve at-least Vl^°pt'"'<^'=/J. The greedy algorithm first selects the pair yielding the maximum weighted 
sum rate among all pairs in A4"''^'^ and then attempts to add pairs to monotonically improve the objective. Thus, 
we can conclude that (l37l ) must be true- 
Notice that we select W = max{iy'^^''''°™, #™ide| ^j^^^ 

{TT/opt, narrow TT;'opt,wide | 
^ — 7, ; } ■ (38) 
l + r + A + 2J' J j 

It is readily seen that 

^opt < ^opt.narrow _|_ ^opt,widc_ ^^9) 

(l38l) and (l39l l together prove the theorem. 
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Appendix B 
Proof of Proposition [2] 

Proof: On any RB j, consider any fixed pair U = {n, v} Q {1, - • • , K} and define tiie set function 

g{A) =p'''{Ur\A,j)X{\U^^A\ = i) +p''''{U,j)X{\Ur\A\ = 2), V ^ c {i, • • • ,k}. (40) 

Our first aim is to prove that g{.) defined above is a monotonic sub-modular set function. First, note that the 
weighted sum rate in ( fTSl ) can also be written as, 



P^'^iU^j) = {Wu - Wi,)p'''{u,j) + Wi, log 



I + + ^n^j^h 



(41) 



>p^"(i), j) + Wu log(l + 4, .(I + h^jhj, .)-ih^j) (42) 

so that p^^'^iUjj) > max{p™('u, j)}, which suffices to prove the monotonicity of g{.). Then, to prove 

sub-modularity we must show that, 

g{AU{q})-g{A)>g{BU{q})-g{B), V^C^?C{1,... ,K}kqe{l,--- ,K}\B. (43) 

To prove (1431 ) we consider any C ^ C {1, • • • , K} so that AnU C BnU and consider the following cases. 
First consider the case, |>t n^| = |S nZY| which implies that both A, B contain the same user(s) from hi so that 
(|43] ) must hold with equality. Then, suppose |yln^/j < |,SnW|. In this case, upon exploiting the inequality 

p^'^{U,j)<p'^{u,j)+p'^{v,j), (44) 

together with the fact that g{B U {q}) — g{B) = when \BnU\ = 2, we can conclude that ( |43] ) must hold. Then, 
since the set function h{.) in (1231 ) is a linear combination of NK{K + l)/2 monotonic sub-modular set functions 
in which the combining coefficients are all positive, we can assert that it must be a monotonic sub-modular set 
function as well. 

Appendix C 

Appendix: Modeling 3GPP LTE Control Channel Constraints 

Note that by placing restrictions on the location where a particular user's control packet can be sent and the 
size of that packet, the system can reduce the number of blind decoding attempts that have to be made by that 
user in order to receive its control packet. We note that a user is unaware of whether there is a control packet 
intended for it and consequently must check all possible locations where its control packet could be present 
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assuming each possible packet size. Each control packet cames a CRC bit sequence scrambled using the unique 
user identifier which helps the user deduce whether the examined packet is meant for it [Tl. In the 3GPP LTE 
system, the minimum allocation unit in the downlink control channel is referred to as the control channel element 
(CCE). Let {1, • • • , i?} be a set of CCEs available for conveying UL grants. A contiguous chunk of CCEs from 
{!,••• ,R} that can be be assigned to a user is referred to as a PDCCH. The size of each PDCCH is referred to 
as an aggregation level and must belong to the set {1, 2, 4, 8}. Let V denote the set of all possible such PDCCHs. 
For each user the BS first decides an aggregation level, based on its average (long-term) SINR. Then, using that 
users' unique identifier (ID) together with its aggregation level, the BS obtains a small subset of non-overlapping 
PDCCHs from V (of cardinality no greater than 6) that are eligible to be assigned to that user. Let denote 
this subset of eligible PDCCHs for a user u. Then, if user u is scheduled only one PDCCH from P„ must be 
assigned to it, i.e., must be used to convey its UL grant. Note that while the PDCCHs that belong to the eligible 
set of any one user are non-overlapping, those that belong to eligible sets of any two different users can overlap. 
As a result, the BS scheduler must also enforce the constraint that two PDCCHs that are assigned to two different 
scheduled users, respectively, must not overlap. 

Next, the constraint that each scheduled user can be assigned only one PDCCH from its set of eligible PDCCHs 
can be enforced as follows. First, define a set Vu containing \Vu\ virtual users for each user u, I < u < K, 
where each virtual user in Vu is associated with a unique PDCCH in and all the parameters (such as uplink 
channels, queue size etc.) corresponding to each virtual user in V„ are identical to those of user u. Let 14 be 
the set of all possible subsets of such virtual users, such that each subset has a cardinality no greater than T 
and contains no more than one virtual user corresponding to the same user. Defining Ad = 14 x C, we can then 
pose (O over Ad after setting L = K with = V^, 1 < s < K. Consequently, by defining the virtual users 
corresponding to each user as being mutually incompatible, we have enforced the constraint that at-most one 
virtual user for each user can be selected, which in turn is equivalent to enforcing that each scheduled user can 
be assigned only one PDCCH from its set of eligible PDCCHs. 

Finally, consider the set of all eligible PDCCHs, Note that this set is decided by the set of active 

users and their long-term SINRs. Recall that each PDCCH in {'Du}u=i maps to a unique virtual user. To ensure 
that PDCCHs that are assigned to two virtual users corresponding to two different users do not overlap, we 
can define multiple binary knapsack constraints. Clearly R such knapsack constraints suffice (indeed can be 
much more than needed), where each constraint corresponds to one CCE and has a weight of one for every pair 
{U, c) G Ad wherein U contains a virtual user corresponding to a PDCCH which includes that CCE. Then, a 
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Figure 1. A Feasible RB Allocation in the LTE UL 
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Figure 2. Average spectral efficiency versus SNR (dB): MU Scheduling with MMSE receiver. 



useful consequence of the fact that in LTE the set Vu for each user u is extracted from V via a well designed 
hash function (which accepts each user's unique ID as input), is that these resulting knapsack constraints are 
column-sparse. 
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Figure 3. Average spectral efficiency versus SNR (dB): MU Scheduling with SIC receiver. 




Figure 4. 



Average spectral efficiency versus SNR (dB) 



MU Scheduling with MMSE and Antenna Selection. 
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Figure 6. Normalized spectral efficiency versus SNR (dB) 
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Figure 7. Normalized complexity versus SNR (dB) 
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Figure 8. Normalized complexity versus SNR (dB). Chunk sizes are included in complexity computations. 
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Parameter 


Assumption 


Deployment scenario 


IMT Urban Micro (UMi) 


Duplex method and bandwidth 


FDD: lOMHz for upUnk 


Cell layout 


Hex grid 19 sites, 3 cells/site 


Transmission power at user 


23 dBm 


Average number of users per sector 


10 or 15 


Network synchronization 


S y nchronized 


Antenna configuration (eNB) 


4 RX co-polarized ant., 0.5-A spacing 


Antenna configuration (user) 


1 TX ant. 


Uplink transmission scheme 


Dynamic MU scheduling, 

MU pairing: Max 2/RB users aligned pairing; 


Fractional power control 


Po=-85 dB, a = 0.8 


Uplink scheduler 


PF in time and frequency 


Scheduling granularity: 


1 RB 


CQI assumptions 


1 ms periodicity and 7 ms delay 
CQI without errors. 


Uplink HARQ scheme 


Synchronous, non-adaptive Chase Combining 


Uplink receiver type 


MMSE-IRC and SIC-IRC 


Channel estimation error 


NA 



Table IV 

Parameters for system level simulations 



Scheduling method 


cell average 


5% cell-edge 


LRT SU 


1.6214 


0.0655 


LRT MU with MMSE 


1.9246 (18.70%) 


0.0524 


LRT MU with SIC 


2.0651 (27.37%) 


0.0745 


LRT-Sequential MU with MMSE 


1.8196 (12.22%) 


0.0627 


LRT-Sequential MU with SIC 


1.9537 (20.5%) 


0.0665 



Table V 

Spectral efficiency of LRT based SU and MU UL scheduling schemes. An average of 10 users are 

PRESENT IN EACH CELL AND ALL ASSOCIATED ACTIVE USERS CAN BE SCHEDULED IN EACH INTERVAL. 



LRT-MU scheduling with: 


cell average 


5% cell-edge 


Knapsack constraint 


1.7833 


0.0266 


pre-selection 1 


1.7940 (0.6%) 


0.0419 (57.52%) 


pre-selection 2 


1.7908 (0.4%) 


0.0414 (55.64%) 


pre-selection 3 


1.8265 (2.42%) 


0.0444 (66.92%) 



Table VI 

Spectral efficiency of MU UL scheduling schemes with MMSE receiver. An average of 15 users are 

PRESENT IN EACH CELL AND AT-MOST 7 FIRST-TRANSMISSION USERS CAN BE SCHEDULED IN EACH INTERVAL. 
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LRT-MU scheduling with: 


cell average 


5% cell-edge 


Knapsack constraint 


1.8865 


0.0411 


pre-selection 1 


2.0082 (6.45%) 


0.0527 (28.22%) 


pre-selection 2 


1.8980 (0.61%) 


0.0451 (9.73%) 


pre-selection 4 


2.1069 (11.68%) 


0.0531 (29.2%) 



Table VII 

Spectral efficiency of MU UL scheduling schemes with SIC receiver. An average of 15 users are 

PRESENT IN EACH CELL AND AT-MOST 7 FIRST-TRANSMISSION USERS CAN BE SCHEDULED IN EACH INTERVAL. 
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